Building Bilingual Lexicons using Lexical Translation Probabilities via Pivot Languages
نویسندگان
چکیده
This paper proposes a method of increasing the size of a bilingual lexicon obtained from two other bilingual lexicons via a pivot language. When we apply this approach, there are two main challenges, ambiguity and mismatch of terms; we target the latter problem by improving the utilization ratio of the bilingual lexicons. Given two bilingual lexicons between language pairs Lf –Lp and Lp–Le, we compute lexical translation probabilities of word pairs by using a statistical word-alignment model, and term decomposition/composition techniques. We compare three approaches to generate the bilingual lexicon: exact merging, word-based merging, and our proposed alignment-based merging. In our method, we combine lexical translation probabilities and a simple language model for estimating the probabilities of translation pairs. The experimental results show that our method could drastically improve the number of translation terms compared to the two methods mentioned above. Additionally, we evaluated and discussed the quality of the translation outputs.
منابع مشابه
Building a Bilingual Lexicon Using Phrase-based Statistical Machine Translation via a Pivot Language
This paper proposes a novel method for building a bilingual lexicon through a pivot language by using phrase-based statistical machine translation (SMT). Given two bilingual lexicons between language pairs Lf–Lp and Lp–Le, we assume these lexicons as parallel corpora. Then, we merge the extracted two phrase tables into one phrase table between Lf and Le. Finally, we construct a phrase-based SMT...
متن کاملBilingual Multi-Word Lexicon Construction via a Pivot Language
Bilingual multi-word lexicons are helpful for statistical machine translation systems to improve their performance. In this paper we present a method for constructing such lexicons in a resource-poor language pair such as Korean-French. By using two parallel corpora sharing one pivot language we can easily construct such lexicons without any external language resource like a seed dictionary. Th...
متن کاملThe Impact of Part-of-Speech Filtering on Generation of a Swedish-Japanese Dictionary Using English as Pivot Language
A common problem when combining two bilingual dictionaries to make a third, using one common language as a pivot language, is the emergence of false translations due to lexical ambiguity between words in the languages involved. This paper examines if the translation accuracy improves when using part-of-speech filtering of translation candidates. To examine this, two different Japanese-Swedish l...
متن کاملBuilding A Chinese WordNet Via Class-Based Translation Model
Semantic lexicons are indispensable to research in lexical semantics and word sense disambiguation (WSD). For the study of WSD for English text, researchers have been using different kinds of lexicographic resources, including machine readable dictionaries (MRDs), machine readable thesauri, and bilingual corpora. In recent years, WordNet has become the most widely used resource for the study of...
متن کاملCross-language synonyms in the lexicons of bilingual infants: one language or two?
This study tests the widely-cited claim from Volterra & Taeschner (1978), which is reinforced by Clark's PRINCIPLE OF CONTRAST (1987), that young simultaneous bilingual children reject cross-language synonyms in their earliest lexicons. The rejection of translation equivalents is taken by Volterra & Taeschner as support for the idea that the bilingual child possesses a single-language system wh...
متن کامل